Rule-Based Chunking and Reusability
نویسندگان
چکیده
In this paper we discuss a rule-based approach to chunking implemented using the LT-XML2 and LT-TTT2 tools. We describe the tools and the pipeline and grammars that have been developed for the task of chunking. We show that our rule-based approach is easy to adapt to different chunking styles and that the mark-up of further linguistic information such as nominal and verbal heads can be added to the rules at little extra cost. We evaluate our chunker against the CoNLL 2000 data and discuss discrepancies between our output and the CoNLL mark-up as well as discrepancies within the CoNLL data itself. We contrast our results with the higher scores obtained using machine learning and argue that the portability and flexibility of our approach still make it a more practical solution.
منابع مشابه
Chinese Chunking and Consistency Checking Using Rule-Based Method
This paper presents a rule-based chunking approach. Rule-based method does well in analyzing the structure of natural language. In order to avoid the confliction of the rules, we extract a small scale chunking rule set for chunking first. Then we define more rules to check and correct the inconsistency phenomena. We also adopt man-machine interaction method to solve some special language phenom...
متن کاملChunking Using Conditional Random Fields in Korean Texts
We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In agglutinative languages such as Korean and Japanese, a rule-based chunking method is predominantly used for its simplicity and efficiency. A hybrid of a rule-based and machine learning method was also proposed to handl...
متن کاملPhrase Chunking Using Entropy Guided Transformation Learning
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better r...
متن کاملMapping Explanation-based Learning onto Soar: the Sequel
In past work, chunking in Soar has been analyzed as a variant of explanation-based learning. The components and processes underlying EBL have been mapped to their corresponding components and processes in chunking. The cost and generality of the resulting rules have also been compared. Here we extend that work by analyzing an implementation of EBL within Soar as a sequence of transformations fr...
متن کاملFast Boosting-based Part-of-Speech Tagging and Text Chunking with Efficient Rule Representation for Sequential Labeling
This paper proposes two techniques for fast sequential labeling such as part-of-speech (POS) tagging and text chunking. The first technique is a boosting-based algorithm that learns rules represented by combination of features. To avoid time-consuming evaluation of combination, we divide features into not used ones and used ones for learning combination. The other is a rule representation. Usua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006